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ABSTRACT 


Four  models  for  studying  the  file  allocation  problem  In 
a computer  network  with  distributed  data  bases  were  reviewed. 

Each  approach  has  merit  but  none  is  totally  satisfactory.  An 
approach  which  would  permit  use  of  all  the  methods  during  the 
design  of  a network  would  be  of  considerable  value  and  could  be 
used  to  establish  a data  base,  get  an  estimate  of  the  number 
of  file  copies  needed,  and  study  the  effect  of  storage  and  other 
constraints  on  file  location. 

INTRODUCTION 

Modern  computer/communications  technology  offers  the  Navy  ADP  systems 
planner  many  alternatives  for  satisfying  the  computational  requirements  of 
a geographically  distributed  set  of  users.  Navy  directives  require  that 
various  alternatives  be  examined  and  that  the  scheme  having  the  lowest  life- 
cycle  cost  be  adopted.  The  major  alternatives  to  be  evaluated  in  this 
study  included  a single  large  computer  serving  its  users  by  remote  termi- 
nals; a set  of  smaller,  independently  operated  computers;  and  an  inter- 
connected set  of  computers  or  a computer  network.  Among  these  schemes, 
the  computer  networks  are  the  most  difficult  to  evaluate.  In  this  report, 
a computer  network  is  regarded  as  a set  of  geographically  separated  pro- 
cessors interconnected  by  means  of  a communications  network  to  enable 
data  and  other  resources  to  be  shared.  It  is  assumed  that  the  network  is 
created  "from  scratch,"  even  though  this  is  rarely  the  case. 

Frank^  has  surveyed  the  network  design  problem,  and  has  outlined  the 
important  areas  in  which  more  research  is  required.  He  classifies  the 
problems  as  of  two  types: 

a)  Those  that  are  difficult  but  tractable,  such  as 

. the  analysis  of  tradeoffs  between  circuit  switching  and 
packet  switching, 

. the  design  of  routing  and  flow  control  techniques  for 
packet  switching  with  priorities, 

. the  design  of  networks  which  are  relatively  insensitive 
to  changes  in  network  traffic, 

Frank,  H. , "Computer  Networks,"  Networks,  Vol.  5,  No.  1,  pp.  69-73 
(Jan  1975).  A complete  listing  of  reports  is  given  on  page  19. 


. the  analysis  of  tradeoffs  between  centralized  control  and 
distributed  data  bases,  ' 

. the  analysis  of  network  survivability  and  reliability,  and 

. the  realization  of  realistic  cost  models  that  consider  the 
amortization  of  capital  investment,  cost  of  operation,  and  cost  of 
maintenance. 

b)  Those  whose  solutions  require  major  breakthroughs,  such  as 

. a theory  which  will  permit  the  analysis  of  dynamic 
characteristics  of  a computer  network.  Current  methods  usually 
require  static  approximations, 

. a theory  of  network  measurement  for  monitoring  network 
performance, 

. techniques  to  permit  the  prediction  of  gross  network 
performance,  and 

. methods  that  will  permit  network  participants  to 
selectively  limit  access  to  their  resources  (an  increasingly  im- 
portant feature). 

! 

This  report  was  prepared  to  assist  the  ADP  Systems  planner  in 
evaluating  some  of  the  alternatives  which  arise  when  tradeoffs  between 
centralized  control  and  distributed  data  bases  are  being  considered. 

Specifically,  the  report  summarizes  and  evaluates  four  models  for  de- 
termining the  optimum  location  of  files  within  a computer  network.  These 
four  models  may  be  characterized  as  follows: 

1.  Cost  minimization  with  file  access  time  as  a constraint,  which 

2 

leads  to  a linear  zero-one  programming  problem  (Chu  ) 

2.  Access  time  minimization  with  cost  as  a constraint,  which  leads 


3 

to  an  integer  programming  problem  (Chandy  and  Hewes  ) 


2 

Chu,  Wesley  W. , "Optimal  File  Allocation  in  a Multiple  Computer 
System,"  IEEE  Transactions  on  Computers,  Vol.  C18,  No.  10  (Oct  1969). 
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Chandy,  K.M.  and  J.E.  Hewes,  "File  Allocation  in  Distributed  Systems," 
Proceedings  of  the  International  Symposium  on  Computer  Performance 
Modeling,  Measurement,  and  Evaluation  (Mar  29-31,  1976)  Harvard  Univ. , 
Cambridge,  Mass. 
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3.  Cost  minimization  with  no  constraints  solved  by  enumeration 

* A 

(Casey  ) 

4.  Various  heuristic  methods  which  may  involve  Interaction  and 
which  are  generally  useful  only  for  hierarchical  networks.  (Casey/ 
Friedman,^  Chang^) 

Short  descriptions  of  each  of  these  methods  follow.  Details  of  the  four 
systems  are  included  in  the  appendix. 

MODELS  FOR  FILE  ALLOCATION 

CHU's  MODEL 

The  model  described  in  Chu  deals  with  the  following  problem: 

"Given  a number  of  computers  that  process  common  information 
files,  how  can  one  allocate  the  files  so  that  the  allocation 
yields  minimum  overall  operating  costs  subject  to  the  following 
constraints: 

1)  The  expected  time  to  access  each  file  is  less  than  a given  bound. 

2)  The  amount  of  storage  needed  at  each  computer  does  not  exceed 
the  available  storage  capacity." 

In  this  mathematical  model,  he  describes  an  information  system  having  a set 
of  nodes  of  specified  storage  capacity  connected  pair-wise  with  pairs  of 
transmission  lines — one  for  queries  and  one  for  replies.  Although  this 
interconnection  rarely  exists  in  practice  and  the  request  rate  for  a given 
file  by  a given  node  is  rarely  a constant,  Chu's  model  appears  to  be 
realistic — at  least  for  circuit  switched  networks.  Additional  assumptions 
would  be  necessary  for  message-switched  or  packet-switched  networks.  Chu 
makes  two  assumptions  which  don't  appear  to  be  critical:  1)  the  number  of 

copies  of  each  file  is  known  in  advance,  and  2)  queries  are  routed  to  all 

4 

Casey,  R.G. , "Allocation  of  Copies  of  a File  in  an  Information  Network," 
AFIPS  1972  Spring  Joint  Computer  Conference,  Vol.  40. 

^Casey,  R.G.  and  T.D.  Friedman,  "Design  Techniques  for  Database- 
Oriented  Computer  Networks,"  IBM  Research  Rept.  RJ  1222  (May  1973). 

6 

Chang,  S.K.,  An  Interactive  Configurator  for  Distributed  Computer 
System  Design,"  IBM  Research  Rept.  RC  5327  (Mar  1975). 
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copies  of  the  file.  As  pointed  out  by  Levin, ^ a network  with  30  nodes 
and  10  files  would  require  about  9000  zero-one  variables  and  18,000 
constraints.  Chu's  model  obviously  should  not  be  used  for  such  cases. 

CHANDY's  MODEL 

Chandy  considers  the  file  allocation  problem  at  several  levels  of 
detail.  At  the  first  level  of  detail,  the  number  of  copies  of  a single 
file  is  to  be  determined,  given  the  frequencies  of  queries  and  updates  and 
neglecting  storage  costs.  Average  access  time  is  to  be  minimized.  At 
the  second  level  of  detail,  total  storage  cost  is  a constraint  and  several 
different  files  may  be  present.  At  the  third  level  of  detail,  hierachies 
are  also  considered.  In  this  model,  a memory  is  characterized  by  its  cost, 
capacity,  and  average  retrieval  time.  The  mathematical  formulation  for 
the  third  level  of  detail  was  not  given  in  the  preliminary  version  of  the 
paper . 

One  of  the  interesting  aspects  of  Chandy's  formulation  is  that,  in 
contrast  to  all  other  models  studied,  the  average  access  time  is  minimized 
and  storage  cost  is  a constraint. 

Chandy  formulates  the  problem  as  an  integer  programming  problem  and 

notes  the  following  with  respect  to  its  solution: 

. In  95  percent  of  the  cases,  the  solution  to  the  linear  programming 
problem  satisfies  integrality  constraints. 

. A heuristic  based  on  hill-climbing  yielded  optimum  solutions  in  95 
percent  of  the  cases. 

. The  heuristic  provides  an  upper  bound  on  the  minimum  cost  of  the 
system  while  the  linear  program  provides  a lower  bound.  In  95  percent  of 
the  cases  the  costs  are  the  same. 

CASEY's  MODEL 

Casey  describes  a mathematical  model  of  an  information  system  having  a 
set  of  nodes,  some  of  which  contain  copies  of  a given  file.  Each  node  is 
assumed  to  be  linked  to  all  the  other  nodes  in  the  network  either  directly 
or  possibly  via  intermediate  nodes.  Two  types  of  transactions  are  processed 
by  the  network  - queries  and  updates.  Queries  are  assumed  to  be  directed 

^Levin,  K.  D. , "Organizing  Distributed  Data  Bases  in  Computer  Net  irks," 
Univ.  of  Pennsylvania,  The  Wharton  School  Report  74-09-01  (1974). 
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to  one  copy  of  the  file  while  updates  are  transmitted  to  all  copies  of  the 
file  In  the  network.  The  model  considers  file  storage  costs,  volume  of 
queries  and  updates,  and  communications  costs  for  queries  and  for  updates. 
For  most  of  his  discussion,  Casey  assumes  query  and  update  costs  to  be 
equal.  Communications  costs  are  assumed  to  vary  linearly  with  distance. 
Using  this  rather  simple  model,  the  optimization  problem  considered  is  to 
locate  the  files  within  the  network  so  that  the  costs  of  storage  and  com- 
munications are  minimized.  Casey  derives  two  interesting  results  in  his 
paper:  1)  If  p is  the  update/query  ratio  and  r is  the  number  of  replica- 

tions of  a file  in  the  network,  then,  if  p>l/(r-l),  the  optimal  allocation 
would  be  that  no  more  than  r nodes  should  have  a copy  of  the  file.  2)  A 
method  for  finding  the  optimal  allocation  is  described  which  is  con- 
siderably better  than  a simple  exhaustive  search.  This  model  will  be  most 
useful  in  the  early  stages  of  network  design. 

CHANG's  MODEL 

Chang  has  proposed  an  extremely  detailed  mathematical  model  of  a dis- 
tributed computer  network.  The  model  permits  a concise  statement  of  the 
design  problem.  Chang  actually  formulates  three  distributed  design 
problems : 

1.  Given  that  processors  and  communication  lines  have  been  sized  and 
located,  tasks  and  files  are  to  be  located  and  a routing  strategy 
determined.  Chang  calls  this  the  transaction  allocation  problem. 

2.  Given  that  locations  of  files  and  processing  steps  have  been 
assigned,  processors  and  communications  lines  are  to  be  sized  and  located. 
Chang  calls  this  the  network  design  problem. 

3.  Given  a partially  specified  system,  i.e.,  one  in  which  some  pro- 
cessors and  lines  have  been  installed  and  some  files  allocated,  the  con- 
figuration is  to  be  completed.  This  Chang  calls  the  system  configuration 
problem. 


The  major  contribution  which  Chang  makes  i8  to  suggest  a heuristic 
design  procedure  for  a hierarchical  network  having  the  following 
characteristics: 

. Tree-like  topological  structure 

. Several  levels  of  processors  with  a single  top-level  processor 

. All  processors  on  a given  level  communicate  to  the  next  higher 
level  with  full-duplex  communications  of  the  same  capacity 

. All  tasks  originate  and  terminate  at  the  same  location. 

There  are  three  steps  in  Chang's  procedure: 

. Transaction  allocation,  in  which  the  processing  steps  of  the 
system  tasks  are  allocated  to  the  various  levels  of  the  network 
hierarchy;  file  allocation  is  part  of  this  step. 

. Processor  allocation,  in  which  the  processing  steps  to  be 
performed  on  a particular  level  of  the  network  are  allocated  to  the 
processors  on  that  level. 

. Line  allocation,  in  which  the  exact  locations  of  the  pro- 
cessors are  determined  so  that  the  total  line  cost  can  be  minimized. 

Transaction  allocation  is  performed  with  the  aid  of  a transaction  alloca- 
tion table  supplied  by  the  designer  and  containing  the  following 
information: 

(a)  For  each  processor  transaction-step  pair,  the  time  delay  re- 
quired to  process  the  transaction  and  the  availability  of  the 
processor  for  the  transaction  are  given. 

(b)  For  each  communication-line  transaction-step  pair,  the  time 
delay  involved  in  transmitting  the  step  and  the  availability  of  the 
line  for  the  step  are  given. 

The  Chang  technique  considers  response  time  and  availability 
requirements  before  cost  minimization. 

Q 

In  a later  paper  written  by  Chang  and  Tang  another  technique  for 
allocating  files  is  described  which  considers  the  trade-offs  between 
storage  cost  and  transmission  cost.  It  would  be  useful  to  incorporate 
such  a technique  into  the  model  considered  here. 


g 

Chang,  S.K.  and  D.T.  Tang,  "Processor  Allocation  in  a Distributed 
Computer  System,"  Proceedings  of  International  Conference  on  Management  of 
Data,  San  Jose,  Calif.  (May  1975). 
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CASEY /FRIEDMAN  MODEL 


Casey  and  Friedman  have  developed  a set  of  APL  functions  which 
they  found  useful  in  a study  of  the  network  design  problem.  The  functions 
were  designed  to  permit  interactive  creation,  modification,  and  evaluation 
of  graphic  theoretic  representations  of  computer  networks.  The  properties 
of  computer  networks  with  leased  lines  as  well  as  dial-up  lines  may  be 
calculated  with  the  functions  provided. 

CONCLUSION 

Each  of  the  four  approaches  to  the  file  allocation  problem  has  merit 
but  none  is  totally  satisfactory.  An  approach  which  would  permit  the 
analyst  to  use  all  the  methods  during  the  design  of  the  network  would  be 
of  considerable  value.  Such  a polyalgorithm  could  be  used  to 

. establish  a data  base,  using  the  logical  structure  described  by 

Chandy ; 

. get  a quick  estimate  of  the  number  of  file  copies  required  and 
an  idea  of  where  the  local  minima  in  the  cost  function  are  located, 
using  Casey's  method; 

. get  an  alternative  estimate  of  the  number  of  files  copies, 
using  Chandy 's  method; 

. study  the  effect  of  storage  constraints  on  file  location, 
using  Chu's  method; 

. consider  (when  desired)  the  effect  of  additional  constraints 
on  file  location  such  as  those  that  may  be  required  in  a heterogeneous 

network,  using  the  interactive  APL  functions  described  by  Casey. ^ 

If  the  network  being  studied  is  hierarchical,  then  the  effect  of 
additional  constraints  may  be  studied  using  the  heuristics  and  functions 
described  by  Chang. ^ For  truly  large  networks,  none  of  the  methods  is 
completely  satisfactory,  although  Chandy's  method  may  be  adequate.  Casey's 
enumerative  method  may  prove  too  costly. 


APPENDIX  A 

DETAILED  DESCRIPTIONS  OF  FILE  ALLOCATION  MODELS 


CHU's  MODEL: 
Given  Data: 

“ijk 


i 

J 


'U 


ik 


Expected  time  for  the  i^  computer  to  retrieve  the  J1*1 
th 


file  from  the  k " computer 
Storage  capacity  of  ittl  computer 

1,  2,...n,  where  n is  the  number  of  computers  in  the  system 

I,  where  m is  the  number  of  distinct  files  in  the 

system 

Length  of  each  transaction  for  file 


The  number  of  redundant  copies  of  the  file 

Storage  cost  per  unit  length  and  unit  time  of  the  j 
file  at  the  iC^  computer 

Transmission  cost  from  kC^  computer  to  the  i1*1 
computer  per  unit  length 


th 


Storage  required  for  jth  file 


.th 


Uij 

Variables : 


i j 


The  frequency  of  modification  of  the  j file  after  each 
transaction 

Maximum  allowable  average  retrieval  time  for  J**1  file  to 
. th 

the  i computer 

Average  usage  rate  of  j1"*1  file  at  ith  computer 


file  stored  in  ith  computer 


- 1 

- (X.  otherwise 
Objective  Function: 


y y 

Ui  Clj  LJ  X1J  + i±k  Cik  Lj  Uij  Xkj  <1-Xij> 
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Constraints: 


? xu  S s bi 


(storage  capacity  not  exceeded) 


2)  (1-XiJ>  ‘ijk^ij 


for  all  j,  i“k 


(maximum  allowable  retrieval  time  not  exceeded) 

Assumptions: 

(1)  One  pair  of  transmission  paths  links  each  pair  of  computers;  one 
path  for  traffic  from  computer  i to  computer  j and  one  path  for  traffic  to 
computer  i from  computer  j . 

(2)  The  expected  time  for  the  it^1  computer  to  retrieve  the  file 

from  the  kttl  computer  i is  a^^  * w^  + w^  + t^j,  where  w^  is  the 

expected  queuing  delay  at  the  i^  computer  for  the  channel  to  the  k^ 

computer,  w is  the  expected  queuing-  delay  at  the  k1*1  computer  for  the 

channel  to  the  i computer,  and  t,  is  the  expected  computer  acccess  time 
th 

to  the  j file.  Chu  assumes  t,  can  be  neglected. 

(3)  A file  residing  on  a given  computer  may  be  accessed  simultaneous- 
ly by  that  computer  and  by  a remote  computer. 

Discussion: 

The  assumption  that  the  computers  in  the  network  are  connected  pair- 
wise with  a pair  of  transmission  paths  is  rarely  if  ever  true  in  practice. 


CASEY's  MODEL 


Given  Data: 


Number  of  nodes 


cr  Cost  (in  dollars  per  month)  of  storing  a copy  of  the  file  in 

^ th 

question  at  the  k node 

X Volume  of  query  traffic  emanating  from  node  j (j»l , 2, . . . ,n) 

-*  (in  megabits  per  month) 
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* Volume  of  update  traffic  emanating  from  node  J (J«l,2,...) 

J (in  megabits  per  month) 

d k Costa  (in  dollars  per  megabit)  of  a unit  of  communication  from 
^ node  j to  node  k for  a query 

d . Costs  (in  dollars  per  megabit)  of  a unit  of  communication  from 
J node  j to  node  k for  an  update 


Variables: 

I is  an  index  set  representing  a file  assignment.  If  I(k)-1,  then  a 
copy  of  the  file  is  located  at  node  k. 


Objective  Function: 
C(l) 


min  d 
kd 


+ 


I 


Discussion: 

This  model  can  be  quite  useful  in  spite  of  its  simplicity.  Casey 
describes  a parameter  study  involving  finding  the  optimal  allocation  for 
five  different  update/query  ratios  for  a nineteen-node  network.  Cal- 
culating the  optima  required  only  ten  seconds  on  an  IBM  360/91. 


CHANDY's  MODEL 

This  summary  considers  access  time  minimization  with  a cost  constraint 
for  multiple  files. 

Given  Data: 


d , V*  r..  U Average  time  spent  in  updating  copy  j 

^ ^ of  file  k per  unit  time  due  to  updates 

at  all  sites 

q j k Rate  at  which  queries  are  generated  at  node  j for  file  k 

r , Time  required  to  implement  an  update  generated  at  site  1 on  a 

copy  of  file  k at  site  J 


r 


C11k  Response  tine  for  a query  at  node  j for  file  k If  the  query  Is 
satisfied  by  accessing  a copy  at  node  1.  If  i-J,  t -0 

1 J K 

Cost  of  storing  a copy  of  file  k in  site  1 k-l....K,  1-1.. ..N 

C...  Cost  of  replying  to  a query  generated  at  site  J regarding  file 
k,  if  a copy  of  the  file  In  site  1 is  accessed  to  satisfy  the 
query 

G System  cost 

K Number  of  files 

N Number  of  sites 

U Rate  at  which  updates  are  generated  at  node  j for  file  k 

Variables : 

X =1*  if  a copy  of  file  k is  stored  at  node  j;  0,  otherwise 

Y , =1,  if  a query  generated  at  site  j for  file  k is  satisfied  by 

accessing  a copy  of  the  file  at  site  i. 

Objective  Function: 


Minimize  T 


I I I 'jk  ’ “ijk 

iVj  j k 


2 X djk  Xjk 


j k 


Subject  to:  ZZZC 

WJ  J k 


ijk  • Yijk + Z Z V ' V ± c 


Z v«k + v • 1 *u 

1*J 

Yijk  - Xjk  all  i»J»k  i*j 


v » non-negative  integers 
ljk’jk 


CHANG's  MODEL 


Given  Data: 


A set  of  stations  at  which  processors  are  to  be  located;  the 
coordinates  of  A1  are  (XI,  Yi). 


The  set  of  processors  to  be  Installed  at  the  stations.  A dummy 
processor  may  be  assigned  to  a station  If  It  is  not  to  have 
any  processing  capacity. 


A processor  has  a capacity  d^  which  is  a vector  containing 


a parametric  representation  of  capacity  of  the  processor;  includes 
such  items  as  throughput,  direct-access  storage,  and  number  of 


terminals  attached,  etc. 


The  cost  of  processor  * 


The  cost  per  thousand  bytes  of  file  storage  for  processor  * 


The  set  of  communications  lines  to  be  used  to  interconnect 


stations.  A dummy  line  e of  zero  capacity  is  included  in  E 


to  represent  two  stations  not  interconnected. 


The  capacity  of  line  e^  is  represented  by  the  vector  b^ 


which  contains  such  parameters  as  line  speed,  line  buffer 
capacity,  etc. 


The  cost  per  mile  of  a communication  line. 


The  set  of  transactions  (jobs,  work  load)  to  be  processed  by  the 
distributed  system.  Each  transaction  may  consist  of  several 


transaction  steps  s^.  Each  of  these  steps  (1)  may  be  processed 


at  a specified  station,  (2)  requires  a specified  file  (assumed 
to  be  at  that  station),  and  (3)  may  require  the  transmission  of 


information  to  another  station. 


The  set  of  files  required  by  the  transaction.  Each  file  D ^ has 


a length  1^;  if  it  is  not  to  be  stored  in  a mass  storage 


device,  1^-0. 


/ 


Variables: 


Two  functions  oust  be  found  which  assign  processors  to  stations  and 
which  allocate  lines  between  processors. 


The  processor  allocation  function  is  a mapping  from  A to  «. 
The  line  allocation  function  is  a mapping  from  AxA  to  E. 


These  two  mappings  specify  the  hardware  configurations  of  the  network.  Two 
additional  mappings  are  required  to  complete  the  specification  and  permit 
the  total  system  cost  to  be  calculated. 


A mapping  from  s to  A.  It  assigns  each  transaction  step  to  a 
station. 

,AxA 


A mapping  from  SxS  to  2 . For  each  pair  of  processing  steps, 

a routing  path  is  specified  which  describes  the  path  along  which 
information  between  the  two  steps  is  transmitted. 


Objective  Function: 

Total  system  cost  is  given  as  the  sum  of  processor  cost,  line  cost, 
and  storage  cost.  The  processor  cost  is 


I{ai  = dp  <V  - -J  } 


The  line  cost  is: 


y I 2 

Z-  UAij,  Ai2)  : (Ai^ 
e_  ' 


Ai2  “ ek) 


qkl  |Ait  - Ai2|  | 


The  storage  cost  is: 


” (»?♦.  <V 


where  is  the  file  storage  cost  per  thousand  bytes  for  the  processor  n 
which  is  allocated  to  A. 
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Chang  has  described  a general  model  for  distributed  computer  systems 
and  has  devised  a heuristic  procedure  for  generating  feasible  (satisfying 
constraints)  configurations.  The  model  is  far  too  complex  to  permit 
optimization  of  a function  such  as  cost.  A set  of  APL  functions  is 
described  to  permit  experimenting  with  the  procedure  for  hierarchical 
computer  systems. 

Constraints: 

Chang  describes  five  constraints  which  are  stated  in  words  as  follows: 

1.  Routing  Constraint.  The  transmission  path  between  two  processing 
steps  should  originate  where  the  first  processing  step  is  located  and  should 
terminate  where  the  second  processing  step  is  located.  The  transmission 
path  should  not  contain  loops. 

2.  Processor  Capacity  Constraint.  The  total  processing  requirements 
of  all  transaction  steps  assigned  to  a station  should  not  exceed  the 
capacity  of  the  processor  assigned  to  that  station. 

3.  Line  Capacity  Constraint.  Between  two  stations  A and  A , the 

11  z 

transmission  requirement  should  not  exceed  the  line  capacity. 

4.  Line  Slot  Constraint.  The  number  of  lines  connected  to  a 
processor  should  not  exceed  the  number  of  slots  available  at  the  processor. 

5.  "Compatibility"  Constraint.  Let  I I ^ be  the  processor  allocated 
to  A^.  Both  the  file  and  the  function  performed  on  the  file  must  be 
compatible  with  the  processor. 

FRIEDMAN'S  MODEL 

The  network  under  consideration  is  represented  by  a link  list.  If 
the  network  has  n nodes  which  are  interconnected  by  m transmission  links, 
then  there  will  be  m entries  in  the  link  list.  Each  entry  consists  of  a 
unique  link  identification  number  followed  by  the  identification  numbers 
of  the  two  nodes  it  connects.  The  user  provides  the  link  list  and 
identifies  which  nodes  contain  copies  of  the  data  file  which  is  to  be 
accessed.  Other  data  which  must  be  provided  include 
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. the  average  amount  of  query  and  update  activity  emanating  from 
each  node  to  each  file, 

. a table  of  discrete  transmission  capacities. 

. the  cost  for  maintaining  each  possible  link  at  each  possible 
capacity,  and 

. the  cost  to  maintain  any  file  at  any  node. 

With  the  preceding  Information,  the  total  cost  as  the  sum  of  storage 
and  transmission  costs  may  be  calculated. 

In  order  to  study  the  effects  of  changes  on  the  behavior  of  the 
network,  several  APL  functions  are  provided  which  remove,  add,  or  replace 
transmission  links.  No  functions  for  relocating  the  files  among  the  nodes 
seem  to  have  been  provided  although  this  should  be  relatively  simple.  A 
complete  list  of  functions  for  leased  line  networks  is  given  below.  A 
comparable  set  of  functions  for  a dial-up  network  is  also  given  in  the 
report . 

APL  Functions: 

FORMCM  Produces  connection  matrix,  given  the  link  list  and 
node  count. 


OKCM  Determines  whether  all  nodes  are  connected. 

CSP  Looks  for  shortest  path  spanning  two  nodes  in  a given 

network. 


FILACT 


LUACT 


PATH 


Provides  a table  containing  average  amount  of  query  and 
update  activity  emanating  from  each  node. 

Calculates  the  activity  on  each  link  of  the  network  after 
location  of  files  on  network  has  been  specified. 

Finds  the  set  of  links  connecting  a node  to  the  nearest 
copy  of  a file  located  at  one  or  more  nodes. 


LQACT  Uses  PATH  to  find  total  query  activity  on  all  links. 

IARIABLE  Indicates  the  cost  for  maintaining  each  possible  link  at 
each  possible  capacity  in  network. 

FMAINT  Indicates  the  cost  of  maintaining  any  file  at  any  node. 
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PORMLTAR  Uses  TARTABLE  to  calculate  monthly  coat  of  all  links 
(called  LTAR). 

TARRIF  Uses  LTAR  and  FMAINT  to  calculate  monthly  coat  of  con- 
figuration, given  the  nodes  at  which  files  are  located. 

GETTARRIF  Given  the  link  list  and  locations  of  flies,  calls  FORMCM, 
LQACT,  LUACT,  MINCAP,  FORMCTAR  and  TARRIF. 

SNIP  Using  the  link  list  and  locations  of  the  file  nodes, 

removes  links  whenever  such  removal  will  result  In  the 
same  or  lowered  costs. 

JOIN  Adds  to  a specified  node  the  one  link.  If  any,  which  will 

result  In  the  greatest  cost  reduction.  Continues  with 
that  node  as  long  as  cost  decreases. 

JOINALL  Accomplishes  the  same  function  as  JOIN  on  a network  - 
wide  basis.  A link  will  be  added  anywhere  In  the 
network  if  it  reduces  costs. 

REPLACE  Substitutes  new  links  for  old  ones  whenever  this  will 
reduce  costs. 
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